90 research outputs found
Chaining Test Cases for Reactive System Testing (extended version)
Testing of synchronous reactive systems is challenging because long input
sequences are often needed to drive them into a state at which a desired
feature can be tested. This is particularly problematic in on-target testing,
where a system is tested in its real-life application environment and the time
required for resetting is high. This paper presents an approach to discovering
a test case chain---a single software execution that covers a group of test
goals and minimises overall test execution time. Our technique targets the
scenario in which test goals for the requirements are given as safety
properties. We give conditions for the existence and minimality of a single
test case chain and minimise the number of test chains if a single test chain
is infeasible. We report experimental results with a prototype tool for C code
generated from Simulink models and compare it to state-of-the-art test suite
generators.Comment: extended version of paper published at ICTSS'1
Learning Concise Models from Long Execution Traces
Abstract models of system-level behaviour have applications in design
exploration, analysis, testing and verification. We describe a new algorithm
for automatically extracting useful models, as automata, from execution traces
of a HW/SW system driven by software exercising a use-case of interest. Our
algorithm leverages modern program synthesis techniques to generate predicates
on automaton edges, succinctly describing system behaviour. It employs trace
segmentation to tackle complexity for long traces. We learn concise models
capturing transaction-level, system-wide behaviour--experimentally
demonstrating the approach using traces from a variety of sources, including
the x86 QEMU virtual platform and the Real-Time Linux kernel
Unbounded safety verification for hardware using software analyzers
Demand for scalable hardware verification is ever-increasing. We propose an unbounded safety verification framework for hardware, at the heart of which is a software verifier. To this end, we synthesize Verilog at register transfer level into a software-netlist, represented as a word-level ANSI-C program. The proposed tool flow allows us to leverage the precision and scalability of state-of-the-art software verification techniques. In particular, we evaluate unbounded proof techniques, such as predicate abstraction, k-induction, interpolation, and IC3/PDR; and we compare the performance of verification tools from the hardware and software domains that use these techniques. To the best of our knowledge, this is the first attempt to perform unbounded verification of hardware using software analyzers
Hardware/Software Co-verification Using Path-based Symbolic Execution
Conventional tools for formal hardware/software co-verification use
bounded model checking techniques to construct a single monolithic propositional formula. Formulas generated in this way are extremely complex and contain a great deal of irrelevant logic, hence
are difficult to solve even by the state-of-the-art Satisfiability (SAT)
solvers. In a typical hardware/software co-design the firmware only
exercises a fraction of the hardware state-space, and we can use this
observation to generate simpler and more concise formulas. In this
paper, we present a novel verification algorithm for hardware/software co-designs that identify partitions of the firmware and the hardware logic pertaining to the feasible execution paths by means of
path-based symbolic simulation with custom path-pruning, propertyguided slicing and incremental SAT solving. We have implemented
this approach in our tool COVERIF. We have experimentally compared COVERIF with HW-CBMC, a monolithic BMC based co-verification
tool, and observed an average speed-up of 5× over HW-CBMC for
proving safety properties as well as detecting critical co-design bugs
in an open-source Universal Asynchronous Receiver Transmitter
design and a large SoC design
DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning
This paper proposes DeepSynth, a method for effective training of deep
Reinforcement Learning (RL) agents when the reward is sparse and non-Markovian,
but at the same time progress towards the reward requires achieving an unknown
sequence of high-level objectives. Our method employs a novel algorithm for
synthesis of compact automata to uncover this sequential structure
automatically. We synthesise a human-interpretable automaton from trace data
collected by exploring the environment. The state space of the environment is
then enriched with the synthesised automaton so that the generation of a
control policy by deep RL is guided by the discovered structure encoded in the
automaton. The proposed approach is able to cope with both high-dimensional,
low-level features and unknown sparse non-Markovian rewards. We have evaluated
DeepSynth's performance in a set of experiments that includes the Atari game
Montezuma's Revenge. Compared to existing approaches, we obtain a reduction of
two orders of magnitude in the number of iterations required for policy
synthesis, and also a significant improvement in scalability.Comment: Extended version of AAAI 2021 pape
Recalibrating classifiers for interpretable abusive content detection
Dataset and code for the paper, 'Recalibrating classifiers for interpretable abusive content detection' by Vidgen et al. (2020) -- to appear at the NLP + CSS workshop at EMNLP 2020.
We provide:
1,000 annotated tweets, sampled using the Davidson classifier with 20 0.05 increments (50 from each) from a dataset of tweets directed against MPs in the UK 2017 General Election
1,000 annotated tweets, sampled using the Perspective classifier with 20 0.05 increments (50 from each) from a dataset of tweets directed against MPs in the UK 2017 General Election
Code for recalibration in R and STAN.
Annotation guidelines for both datasets.
Paper abstract
We investigate the use of machine learning classifiers for detecting online abuse in empirical research. We show that uncalibrated classifiers (i.e. where the 'raw' scores are used) align poorly with human evaluations. This limits their use to understand the dynamics, patterns and prevalence of online abuse. We examine two widely used classifiers (created by Perspective and Davidson et al.) on a dataset of tweets directed against candidates in the UK's 2017 general election.
A Bayesian approach is presented to recalibrate the raw scores from the classifiers, using probabilistic programming and newly annotated data. We argue that interpretability evaluation and recalibration is integral to the application of abusive content classifiers
- …